Hugging Face's LLM post-training library TRL has reached v1.0. Stable/Experimental tiers, the stabilization of GRPO/DPO/SFT, and a roadmap that includes asynchronous GRPO all point to a more mature stack.
A self-editing search agent with 20B parameters published by Chroma. It performs multi-hop search while dynamically pruning the context, and shows the same or higher accuracy than the Frontier model at 1/10 the cost and up to 10 times faster latency. Weights are exposed in Apache 2.0.
Cursor released Composer 2 without disclosing its base model; calling its OpenAI-compatible API revealed it is Kimi K2.5. This escalated into a licensing dispute, but a formal commercial agreement with Moonshot AI was subsequently confirmed.
HuggingFace conducts a comparative analysis of 16 open source RL training libraries based on 7 design axes. In the synchronous type, the GPU utilization remains at around 60% due to the generation bottleneck, but with an asynchronous separation design it can be improved to over 95%.
Microsoft released an open-source framework that can optimize almost any AI agent with reinforcement learning, with little to no code changes. It supports arbitrary frameworks such as LangChain, AutoGen, and Claude Agent SDK.